Introduction to Plotly¶



ONS / NISR
2021

What is plotly¶

  • Plotly is an open source graphing library which allows users to easily create highly customisable interactive and static charts in JavaScript, python or R.

  • Plotly was built using Python and the Django framework, with a front end using JavaScript and the visualization library D3.js, HTML and CSS.

  • Plotly allows for the creation of professional looking charts for both online and offline reports

Plotly vs plotly express¶

  • Plotly.express is a high-level API for plotly.py, which allows users to create charts quickly without have to dig into the plotly figure objects and understand the components. Plotly.express for python is similar to the standard plotly library for R in terms of "grammar" and relies on being fed "tidy" (long) data.

  • Use of the higher-level plotly.express and the standard plotly libraries have their pros and cons and different users will prefer one from the other. The beauty is that any plotly figure (regardless of the language in which it was created) can be exported to JSON (JavaScript Object Notation) and used to recreate the same chart in any compatible language for editing etc. in very few steps.

Installing plotly¶

Plotly can be installed using pip. Remember to restart the kernel once the install has completed!

!pip install -U plotly
Requirement already satisfied: plotly in /opt/miniconda3/lib/python3.9/site-packages (5.3.1)
Collecting plotly
  Downloading plotly-5.5.0-py2.py3-none-any.whl (26.5 MB)
     |████████████████████████████████| 26.5 MB 1.7 MB/s eta 0:00:01
Requirement already satisfied: six in /opt/miniconda3/lib/python3.9/site-packages (from plotly) (1.16.0)
Requirement already satisfied: tenacity>=6.2.0 in /opt/miniconda3/lib/python3.9/site-packages (from plotly) (8.0.1)
Installing collected packages: plotly
  Attempting uninstall: plotly
    Found existing installation: plotly 5.3.1
    Uninstalling plotly-5.3.1:
      Successfully uninstalled plotly-5.3.1
Successfully installed plotly-5.5.0

Plotly Express¶

Importing data¶

First lets import some data to play with using the cars.csv in the data folder.

import pandas as pd 

cars = pd.read_csv('./data/cars.csv')
cars
Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length
0 Acura MDX SUV Asia All 36945 33337 3.5 6.0 265 17 23 4451 106 189
1 Acura RSX Type S 2dr Sedan Asia Front 23820 21761 2.0 4.0 200 24 31 2778 101 172
2 Acura TSX 4dr Sedan Asia Front 26990 24647 2.4 4.0 200 22 29 3230 105 183
3 Acura TL 4dr Sedan Asia Front 33195 30299 3.2 6.0 270 20 28 3575 108 186
4 Acura 3.5 RL 4dr Sedan Asia Front 43755 39014 3.5 6.0 225 18 24 3880 115 197
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
423 Volvo C70 LPT convertible 2dr Sedan Europe Front 40565 38203 2.4 5.0 197 21 28 3450 105 186
424 Volvo C70 HPT convertible 2dr Sedan Europe Front 42565 40083 2.3 5.0 242 20 26 3450 105 186
425 Volvo S80 T6 4dr Sedan Europe Front 45210 42573 2.9 6.0 268 19 26 3653 110 190
426 Volvo V40 Wagon Europe Front 26135 24641 1.9 4.0 170 22 29 2822 101 180
427 Volvo XC70 Wagon Europe All 35145 33112 2.5 5.0 208 20 27 3823 109 186

428 rows × 15 columns

Plotly Express basic usage¶

Traditionally plotly express is imported as the alias px using the below code.

import plotly.express as px

This gives us access to many functions for creating different charts. Each as its own set of arguments that be altered to change the appearance and behavior of the charts that are produced.

Scatter plot¶

The following code generates a simple x-y scatter plot using the plotly.express.scatter() method.

The first argument of this function is the pandas dataframe which holds the data we wish to visualize (in our case cars).

The arguments x and y then take as their values strings matching the column name in the dataframe that we want to plot as the x and y coordinates, respectively. In this case we want to plot Horsepower against MPG_City.

# assign the created scatter plot to the object 'scatter'
scatter = px.scatter(cars, # dataset object
                     x = 'Horsepower', # variable to show on x axis
                     y = 'MPG_City',  # variable to show on y axis
                     )

# display the scatter plot inline within the notebook referencing name as last item to appear in cell
scatter

That's was easy, and it seems that more Horsepower results in a lower MPG_city...but we may want to change colors of points and sizes of points based on a variable in the dataframe to gain further insight.

scatter = px.scatter(cars, 
                     x = 'Horsepower',
                     y = 'MPG_City',
                     color = 'Type', # color by 'Type', also creates a legend
                     size = 'EngineSize', # size points by 'EngineSize'
                     title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
                    )
scatter

Line chart¶

Creating a line chart is just as easy as creating a scatter chart, we just instead use the function plotly.express.line()

line = px.line(cars[cars['Type']=='SUV'].sort_values('Horsepower'), # dataset object
               x = 'Horsepower', # variable to show on x axis
               y = 'MPG_City',  # variable to show on y axis
               color = 'Origin'
               )
line

Bar chart¶

We can also create bar charts using the plotly.express.bar() method just as easily...

It is important to note that each method may have a few arguments unique to that method, so simply changing the scatter to bar in the method call may result in an error being thrown. It may work though!

bar = px.bar(cars, 
             x = 'Origin',
             y = 'MSRP',
             color = 'Type',
             barmode = 'group', # this is a new argument, it tells plotly to stack all the values of 'MSRP'
             title = 'My first px bar plot!'
            )
bar

This shows what we want but it looks a bit funny as all the individual values are stacked, and the hover label reflects this, showing only the closest value to be plotted rather than the total MSRP, for example.

Histogram¶

We can do a little better with the plotly.express.histogram() method to aggregate the data by the groups and plot the total (or average, or min or max) MSRP of each vehicle type by region of manufacture.

hist = px.histogram(cars, 
                    x = 'Origin',
                    y = 'MSRP',
                    histfunc = 'sum', # sum the MSRP by color group and split by x
                    color = 'Type',
                    barmode = 'group',
                    title = 'Total MSRP of vehicle types by region of manufacture'
                    )
hist

Pie chart¶

Pie charts aren't loved by everyone, but they're simple enough to create using plotly. Examples can be found at https://plotly.com/python/pie-charts/

We'll use a simple fake dataset to illustrate the use of pie charts

pie_data = pd.DataFrame({'Category': ['Research','Teaching','Estates','Support','Climate'],
                        'Expenditure': [4500, 2500, 1000, 500, 500]})

pie = px.pie(pie_data, labels='Category', values='Expenditure', hole=0)
pie

Problems 1¶

Problem 1.1¶

Using the dataset cars create a scatter plot of EngineSize vs Horsepower. Make the size of the point proportional to the weight of the car and colour the point based on which part of the world the car was built in.

Problem 1.2¶

Create a bar chart showing the average Horsepower by car Type. Split an colour the bars by the Origin of the car.

Problem 1.3¶

  • 1.3.1 Find the total weight of the cars in each part of the world
  • 1.3.2 Create a donut chart of these total weights, "pull out" the slice of the donut associated with the USA.

Hint: The documentation for the px.pie() function will be helpful here.

NISR Style¶

We've seen that plotly charts can be styled in lots of different ways. However in most instances it would be nice if we could just have plotly figure out all the appropriate styles for us. That way we don't have to worry about manually styling every chart that we make.

Enter nisr_style

nisr_style is a Python library that can be imported that with automatically style all the charts that you make. It will insure that everyone produces charts that look the same without having to manually update the styles!

Installing NISR Style¶

nisr_style only exists on NISR's github repository. We can install it but we need to have access to that repository, fortunately you all should.

nisr_style can be installed like any other python package using pip

!pip install git+https://github.com/NISR-analysis/ds-styleguide.git

Note that we need to use git+ and then the URL of the repository that contains nisr_style.

Using nisr_style¶

Once we've installed the library we just need to import nisr_style and any plotly chart that we make will be in the house style.

import pandas as pd 
import plotly.express as px 
import nisr_style 

cars = pd.read_csv('./data/cars.csv')

scatter = px.scatter(cars, 
                     x = 'Horsepower',
                     y = 'MPG_City',
                     color = 'Type', # color by 'Type', also creates a legend
                     size = 'EngineSize', # size points by 'EngineSize'
                     title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
                    )
scatter

This library will automatically theme all the different chart types available in plotly, for example here is our pie chart again.

cars = pd.read_csv('./data/cars.csv')
avg_weight = cars.groupby('Origin')[['Weight']].sum().reset_index()

pie = px.pie(avg_weight, names='Origin', values='Weight', hole=0.5)
pie.update_traces(pull=[0,0,0.3])

pie

Exporting and saving charts¶

Static image¶

Its possible to export any plotly chart as a static image such as a .png or .pdf for use in other documents by using the method write_image

import pandas as pd 
import plotly.express as px 
import nisr_style 

cars = pd.read_csv('./data/cars.csv')

scatter = px.scatter(cars, 
                     x = 'Horsepower',
                     y = 'MPG_City',
                     color = 'Type', # color by 'Type', also creates a legend
                     size = 'EngineSize', # size points by 'EngineSize'
                     title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
                    )
scatter.write_image('./scatter.png', scale=4)

Interactive chart¶

Saving a plotly chart as a static image means that we lose interactivity. It is possible to save a chart with the interactivity. This can be useful if you want to send the chart to someone, but don't want them to have to run a notebook in order to see it. To do this we use the write_html method.

scatter = px.scatter(cars, 
                     x = 'Horsepower',
                     y = 'MPG_City',
                     color = 'Type', # color by 'Type', also creates a legend
                     size = 'EngineSize', # size points by 'EngineSize'
                     title = 'My first px scatter plot!', # Add a title! (<b> tags make title bold)
                    )
scatter.write_html('./scatter.html')